吞咽困难是帕金森氏病(PD)的早期症状之一。大多数现有方法使用特征选择方法为所有PD患者找到最佳语音特征子集以提高预测性能。很少有人考虑患者之间的异质性,这意味着需要为不同患者提供特定的预测模型。但是,为每个患者建立这个预测模型都面临着小样本量的挑战,这使其缺乏普遍的能力。实例转移是弥补这种缺陷的有效方法。因此,本文提出了针对PD严重性预测的基于患者的特定于游戏转移(PSGT)方法。首先,选择机制用于从源域中选择与目标患者相似的疾病趋势的PD患者,这大大降低了实例转移的范围并降低了负转移的风险。然后,通过Shapley值对转移的受试者的贡献及其实例对目标受试者的疾病估计进行了公平评估,从而提高了该方法的解释性。接下来,根据转移受试者的贡献确定有效实例的比例,并且根据此比例进行更高贡献的实例,以进一步降低转移的实例子集和目标对象之间的差异。最后,将选定的实例子集添加到目标主体的训练集中,并将扩展数据馈入随机森林中,以提高PD严重性预测方法的性能。帕金森的远程监控数据集用于评估可行性和有效性。实验结果表明,所提出的PSGT方法在预测误差和稳定性中具有更好的性能,而不是比较方法。
translated by 谷歌翻译
Despite high global prevalence of hepatic steatosis, no automated diagnostics demonstrated generalizability in detecting steatosis on multiple international datasets. Traditionally, hepatic steatosis detection relies on clinicians selecting the region of interest (ROI) on computed tomography (CT) to measure liver attenuation. ROI selection demands time and expertise, and therefore is not routinely performed in populations. To automate the process, we validated an existing artificial intelligence (AI) system for 3D liver segmentation and used it to purpose a novel method: AI-ROI, which could automatically select the ROI for attenuation measurements. AI segmentation and AI-ROI method were evaluated on 1,014 non-contrast enhanced chest CT images from eight international datasets: LIDC-IDRI, NSCLC-Lung1, RIDER, VESSEL12, RICORD-1A, RICORD-1B, COVID-19-Italy, and COVID-19-China. AI segmentation achieved a mean dice coefficient of 0.957. Attenuations measured by AI-ROI showed no significant differences (p = 0.545) and a reduction of 71% time compared to expert measurements. The area under the curve (AUC) of the steatosis classification of AI-ROI is 0.921 (95% CI: 0.883 - 0.959). If performed as a routine screening method, our AI protocol could potentially allow early non-invasive, non-pharmacological preventative interventions for hepatic steatosis. 1,014 expert-annotated liver segmentations of patients with hepatic steatosis annotations can be downloaded here: https://drive.google.com/drive/folders/1-g_zJeAaZXYXGqL1OeF6pUjr6KB0igJX.
translated by 谷歌翻译
贝叶斯优化(BO)是机器学习算法的封锁率优化(HPO)广泛流行的方法。在其核心,Bo迭代地评估有前途的配置,直到用户定义的预算(例如挂钟时间或迭代次数)耗尽。虽然在调整大量后的最终性能取决于提供的预算,但很难提前预先指定最佳价值。在这项工作中,我们为BO提出了一种有效而直观的终止标准,如果它足够接近全球Optima,则会自动停止程序。在广泛的实际HPO问题中,我们表明,与来自文献的现有基线相比,我们的终止标准实现了更好的测试性能,例如在改进概率下降到固定阈值以下时停止。我们还提供了证据表明,与我们的方法相比,这些基线对其自身的Quand参数的选择非常敏感。此外,我们发现在HPO的背景下可能会出现过度装备,这可以在文献中可以说是一个忽视的问题,并表明我们的终止标准减轻了小型和大型数据集的这种现象。
translated by 谷歌翻译
深度学习方法为计算机视觉带来了许多突破,尤其是在2D面部识别中。但是,基于深度学习的3D面部识别的瓶颈是,无论是用于行业还是学术界,都很难收集数百万个面孔。鉴于这种情况,有许多方法可以通过3D面部数据增强从现有的3D面上产生更多的3D面,这些面孔用于训练深3D面部识别模型。但是,据我们所知,没有方法可以从2D面图像中生成3D面,以训练深3D面部识别模型。这封信的重点是重建的3D面部表面在3D面识别中的作用,并提出了一个2D辅助深3D面识别的框架。特别是,我们建议使用基于深度学习的3D面部重建方法(即Expnet)重建大规模2D面部数据库(即VGGFACE2)的数百万个3D面部扫描。然后,我们采用了两阶段的训练方法:在第一阶段,我们使用数百万的面部图像预先培训深卷积神经网络(DCNN),在第二阶段,我们使用了正常的组件图像(NCI)重建3D面扫描以训练DCNN。广泛的实验结果表明,与通过2D Face Images训练的模型相比,所提出的方法可以大大提高FRGC v2.0,Bosphorus和BU-3DFE 3D面部数据库的3D面部识别的排名1得分。最后,我们提出的方法在FRGC v2.0(97.6%),Bosphorus(98.4%)和BU-3DFE(98.8%)数据库中获得了最先进的排名1分。实验结果表明,重建的3D面部表面很有用,我们的2D辅助深3D面部识别框架是有意义的,面对3D面的稀缺性。
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译